AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. 



1-10 (Cancelled) 




A 




(Currently Amended) A processor comprising The processor of claim 

a memory to store a first packed data operand having a first pluralityof data elements and 
a second packed data operand having a second plurality of data^lements; 

a partial-width packed data instruction to indicate the firs^packed data operand and the 
second packed data operand and to indicate a first operation to be performed on a subset 
of corresponding pairs of data eleflTerUs of the firsc and the second packed data operands; 

a decoder coupled with the mempry to recfej^e the partial-width packed data instruction 
and to decode the partial-width |a^ked ^dmaynstruction, wherein the decoder is a decoder 
to convert the partial-width packfed^data instruction into a first micro instruction that 
corresponds to a first subset of afycast one corresponding pair of data elements of the first 
and the second packed dat^ ^raAfts -aatfa second micro instruction that corresponds to a 
second subset of at least/6ne corresponding pair of data elements of the first and the 
second packed data Operands; and 

a partial-width/execution unit coupled with the decoder to execute the operation on the 
subset of corresponding pairs of data elements, wherein the partial-width execution unit 
is a partial-width execution unit to execute an operation specified by the first micro 
instruction on the first subset. 
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12. (Previously Presented) The processor of claim Yl, further comprising a port to receive 



p. V 



4 



at least one data element of the first subset and to not receive a data elen^ent of the second 
subset. 



(Previously Presented) The processor of claim 



wherein the processor is a processor to eliminate the second macro instruction; and 

wherein the processor is a processor to set at least one result data element corresponding 
to the second subset to a predetermined value. 

L 

(Previously Presented) The processor of claim 

further comprising delay circuitry to delay execution of operations on the second subset; 
and 

wherein the partial-width execution umt is a partial-width execution unit to execute an 
operation specified by tyie second partial-width micro instruction on the second subset 
after the delay. 



(Currently Amended) 



i processor of claim [[10]] 



further comprising a firsfpcftl coupled with the memory to receive the first packed data 
operand and a second^ort coupled with the memory to substantially simultaneously 
receive the seconcLpacked data operand; 

further comprising divide circuitry to divide the first packed data operand into a first third 
subset comprising at least one data element and a second fourth subset comprising at least 
one dataylement and to divide the second packed data operand into a third fifth subset 
comprising at least one data element and a fourth sixth subset comprising at least one 
data/element; and 
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wherein the partial-width execution unit is a partial-width executionyunit to perform the 

first operation on at least one corresponding pair of data element^/of the fest third and the 

thifd fifth subsets to generate at least one resulting data elemer 

s 

(Currently Amended) The processor of claim 

further comprising delay circuitry to delay the second fgurth subset and to delay the 
fourth sixth subset; and 



1 



wherein after the delay, the partial-width execution unit is a partial-width execution unit 
to perform the first operation on at least one corresponding pair of data elements of the 
second fourth and the fourth sixth subsets t^generate at least one additional resulting data 
element. 

(Currently Amended) T^j^roces^or of claim^6, wherein the partial-width execution 
unit is a partial-width execution utoft to generate at least one additional resulting data 



element corresponding to the s e com 



fourth and the fourth sixth subsets by setting the at 



least one additional resu^jng jrata element to a predetermined value. 

(Previously Presented! 3Ule processor of claim lo, wherein the partial-width execution 
unit is a partial-width ex/ecutujm unit to execute the packed data instruction and a second 
similar packed data instruction on a half clock cycle. 

(Previously Presented) The processor of claim 1/6, wherein the partial-width execution 
unit is a 64-bit partial-width execution unit, and wherein the first and the second packed 
data operandyare 128-bit operands. 
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Or 



(Previously Presented) A method comprising: 

ceiving a packed data instruction that specifies memory locations of a first full-width 
packed data operand having a plurality of data elements and a second full-width packed 
data operand having a corresponding plurality of data elements; 

substantially simultaneously Accessing the first full-width packed data operand and the 
second full-width packed data operand from the memory locations; 

dividing the first full-width packed data operand into a first subset of data elements and a 
second subset of data elements and dividing the second full-width packed data operand 
into a third subset of data elements and a fourth subset of data elements; 

performing an operation specified by the packed data instruction on the first and third 
subsets of data elements to generate a first resulting one or more data elements; 

delaying the secand and fourth subsets of data elements; 

after said delajang, performing an operation specified by the packed data instruction on 
the second any the fourth subsets of data elements to generate a second resulting one or 
more data elements; and 

storing the ijlrst and the second resulting data elements in a common packed data operand. 

(Previously Presented) /The method of claim^Cf, wherein performing an operation 
specified by the m^ryunstruction on the second and fourth subsets comprises setting a 



data element to a pFSeerermined value. 

r 



(Previously Presented) / The method of claimed, wherein dividing includes 
dividing a 128-bit pack6d data operand into a 64-bit segment of two low order data 
elements and a 64-bit segment of two high order data elements. 
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(Previously Presented) A processor comprising: 

a packed data instruction to specify an operation on a plurality of data elements of at least 
one packed data operand; 

a decoder to generate a first micro i/struction and a second micro instruction 
corresponding to the packed datjyfnstruction, the first micro instruction specifying a first 
operation and the second microinstruction specifying a second operation; 

an execution unit to execution operation specified by the first micro instruction on only a 
subset of the plurality ofi^acked data elements; and 

circuitry to eliminatythe second micro instruction. 

(Previously Presented) The processor of clairn^, wherein the decoder is a decoder to 
create the seco/d micro instruction by replicating the first micro instruction to create a 
replica and modifying the replica to create the second micro instruction. 

25. (Previously Presented) The processor of claim ^3, wherein the execution unit is an 
execution unit to set a data element in a result packed data operand to a predetermined 
value 



1r )Jb. (Previously Presented) A memod comprising: 



receiving a packed data instruction specifying memory locations of a first packed data 
operand and aieconXpapked data operand; 

converting the piackea aata instruction into a first packed data micro instruction and a 
second packed offla rruOTO instruction; 

executing the first packed data micro instruction including accessing only a subset of data 
elements ofirche first and the second packed data operands comprising at least one pair of 
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corresponding data elements from the first and the second packed^iata operands and 
causing an operation specified by the packed data instruction t^be performed on the 
subset to produce a resulting one or more data elements; anc 



n 

4 



executing the second packed data micro instruction including accessing only a subset of 
data elements of the first and the second packed dat^perands comprising at least one 
pair of corresponding data elements from the firsthand the second packed data operands 
and causing an operation specified by the packet! data instruction to be separately 
performed on the subset to produce a resulting one or more additional data elements. 

(Previously Presented) The method o&claim^ wherein executing the second packed 
data micro instruction includes setting^ data element of the one or more additional data 
elements to a predetermined value. 

(Previously Presented)X That method of claim 26, further comprising: 
writing the resultin&xme or mole data elements to a result packed data operand; and 



¥ 



writing the resulting^ on^yor npre additional data elements to the same result packed data 
operand. 

(Previously Presented) The method of claim 26, wherein executing the second packed 
data micro instruction is delayed relative to executing the first packed data micro 
instruction. 



(Currently Amended) A processor comprising: The processor of claim 30, 



a n&mory to store a first packed data operand and a second packed data operand; 
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an instruction to indicate the first packed data operand and the decond packed data 
operand and to indicate an operation to be performed on the ^irst packed data operand and 
the second packed data operand; 

decoder means for decoding the instruction, wherein tne decoder means is a decoder 
means for decoding the instruction into a first micro instruction that specifies an 
operation on only a portion of the first and the second packed data operands and a second 
micro instruction that specifies an operation <^n only a different portion of the first and the 
second packed data operands ; and 

execution means for executjpg the insty6ction . 

1/ 

^ ^ (Currently Amended) ^yfaroces^r comprising: The processor of claim 30, 



a memory to store a first paarcep data operand and a second packed data operand; 

an instruction to indicate the fjffi^packed data operand and the second packed data 
operand and to indicate an operation to be performed on the first packed data operand and 
the second packed data/operand; 



^ecc 

/ 

lgfo 



decoder means for decoding the instruction; and 



execution mean/ for executing the instruction, wherein the execution means is an 
execution meafns for performing operations specified by the instruction on a first subset 
of corresponding pairs of data elements of the first and the second packed data operands 
and after 21 delay to perform operations specified by the instruction m e ans on a second 
subset off corresponding pairs of data elements of the first and the second packed data 
operands. 
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(Previously Presented) A computer system comprising: 
a bus; 

a storage device including a flash memory coupled to the bus to store data; 

a processor coupled to the storage device by trie bus to execute instructions; 

a memory of the processor to store a first packed data operand having a first plurality of 
data elements and a second packed datayoperand having a second plurality of data 
elements; 

a decoder of the processor coupled/with the memory of the processor, the decoder to 
receive a partial-width packed dsfta instruction and to decode the partial-width packed 
data instruction, wherein the partial width packed data instruction indicates the first 
packed data operand and ther second packed data operand, and indicates a first operation 
to be performed on a subset of corresponding pairs of data elements of the first and the 
second packed data operands; and 

a partial-width execiition unit of the processor coupled with the decoder to execute the 
operation on the stfoset of corresponding pairs of data elements. 

/ % 

(Previously Presented) The computer system of claim 33: 

wherein theyflecoder is a decoder to convert the partial-width packed data instruction into 
a first mic/o instruction that corresponds to a first subset of at least one corresponding 
pair of c^ta elements of the first and the second packed data operands and a second micro 
instruction that corresponds to a second subset of at least one corresponding pair of data 
elements of the first and the second packed data operands; and 
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wherein the partial-width execution unit is a partial/vidth execution unit to execute an 
operation specified by the first micro instruction 9n the first subset. 



(Previously Presented) The computer system of claim ^4: 
wherein the processor is a processor to eliminate the second micro instruction; and 



wherein the processor is a processor to set at least one result data element corresponding 
to the second subset to a predetermine^ value. 

(Previously Presented) The computer system of claim 30: 

further comprising a first port conpled with the memory to receive the first packed data 
operand and a second port coujfled with the memory to substantially simultaneously 
receive the second packed da/a operand; 



further comprising divide circuitry to divide the first packed data operand into a first 
subset comprising at leasr one data element and a second subset comprising at least one 
data element and to diyide the second packed data operand into a third subset comprising 
at least one data element and a fourth subset comprising at least one data element; and 

wherein the partiafl-width execution unit is a partial-width execution unit to perform the 
first operation on at least one corresponding pair of data elements of the first and the third 
subsets to generate at least one resulting data element. 



< 



(Previously Presented) The computer system of claim 



further /omprising delay circuitry to delay the second subset and to delay the fourth 
subseft and 
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wherein after the delay, the partial-width execution unit is a partial-width execution unit 
to perform the first operation on at least one corresponding pair of data elements of the 
second and the fourth subsets to generate at least one additional resulting data element. 
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